So, let's talk about why Core is different from Pentium 4 and Pentium M and, to some degree, Athlon 64.
Wide Dynamic Execution
Under this heading there are a number of improvements that Intel has made in Core.
'Pipe' width: Just like a graphics card, a CPU has a number of 'pipes', and the number of pipes in the processor dictate how much the processor can do at the same time - the number of IPCs. In Core, Intel has simply added another pipe to do more.
Pentium 4 NetBurst and Pentium M mobile chips each had three execution units, whereas Core chips have 4. You do the math.
Pipeline reduction: As instructions enter the processor, they need to be carried out. How much work can be done on an instruction is dictated by the
length of the pipe its in. For example, if you have a 31-stage pipeline, as you do in the Pentium 4 Prescot, you can do 31 operations on an instruction in one clock cycle. That's a lot of operations. The advantage of a long pipeline is that it reduces the amount of times when an instruction comes out the end of the pipeline without being completed. When this happens, it has to loop around and go through again to be finished. Long instructions matched with long pipelines makes for efficiency.
However, the downside of a long pipeline is that if you're doing only short instructions that perhaps only require seven or eight stages, then it still has to make its way out of the other 24 stages of the pipeline before its finished. This is uncessesarily slow.
Core chips have a 14 stage pipeline. This is now considered to be the most efficient compromise between handling short instructions and long instructions, and should result in better performance.
Obviously, since there are less stages that makes for less room on the chip, which means lower power which makes for more efficiency.
Ops fusion: Fusion is a fairly simple thing to understand. When the processor receives a standard x86 instruction, it breaks it down into processor-specific instructions for processing in accordance with the microarchitecture, as we explained above.
Micro-ops fusion allows the processor to fuse together repetitive instructions in x86 code to reduce the amount of work the processor has to do. If the same thing is being done multiple times, a good micro-ops fusion engine will detect that, eleminate the dupes, then pass it on to the pipeline for processing. This is currently done in the Pentium M processor, and is one of the reasons that it's so comparatively fast next to the Pentium 4.
Core takes this idea one further, and introduces Macro-ops fusion, too. This type of fusion works on the x86 instructions themselves, not just their micro derivatives. Common instruction pairs can be combined into a single micro operation. This, again, makes for less work for the processor pipeline to run.
Better power management
Previous versions of Intel's mobile chips have included SpeedStep, a technology that clocks the processor down if it's not doing much. Core takes this far further. It has a very fine controller than turns on sections of the processor only when they are needed. It's possible to shut one core right down if you're only using a single-threaded application. The use of fast-switching transistors means the processor can be 'off' as much of the time as possible, without affecting performance.
Want to comment? Please log in.